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Geographic coding of information has become an 
important tool in the analysis of data for federal, state, 
and local governments. Presently, there exist many 
individual street address records with valuable socio- 
demographic information, and frequently, analysts and 
planners desire to see these data aggregated by some 
type of areal unit (e.g., census tract, block, congressional 
district). Such aggregation permits a search for patterns 
with respect to the desired characteristics at various 
levels. Given the volume of records that often exists, 
it is imperative that a computer be available to perform 
this task. The U.S. Bureau of the Census (1970) has 
developed the ADMATCH system of computer programs 
designed to perform geocoding to a variety of records. 
However, we were unable to use this system of programs 
due to a host of difficulties specific to our needs. 

In the following paragraphs, we present a description 
of a system of three computer programs that can be 
viewed as an alternative to the ADMATCH system. 
Like ADMATCH, our system of programs is designed to 
match records from two distinct files: a master list of 
addresses with the appropriate geographical codes and 
the individual records to which these codes are to be 
assigned. This system of programs is used to transfer a 
tract and block code from a master address list contain- 
ing these codes to address records not previously con- 
taining them. Generally speaking, this master list of 
addresses is any reference file containing both addresses 
and geographic codes specific to those addresses. The 
computer programs are written in ANSI-COBOL. The 
interrelationships among these programs are shown in 
Figure 1. 

Program 1: Select Census Record—Core Required— 
37 KB. The Select Census Record program reads the 
master address file (in this case, the census GBF-DIME 
file, which is a specific reference file) and selects all 
desired records as defined by zip code.’ The census file 
has a range of house numbers for the left side of the 
street and a range of numbers for the right side, with 
the left side being odd and the right side even. There- 
fore, two records are written for each street segment 
selected. The odd numbers have an assigned type code 
of 1, and even numbers have an assigned type code of 2. 
The output is condensed, containing only those fields 
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Figure 1. Flowchart for match system. 


needed for matching, plus tract and block numbers. 
The output also contains the address as it appears in 
the census GBF-DIME file and the address as it appears 
in the local address file (addresses one wishes to match), 
thus providing two separate formats by which to match 
records. 

The output file is sorted into sequence according to 
the following characteristics, listed in order of impor- 
tance (major to minor): (1) record type (1 or 2), (2) street 
name (exact name, disregarding N., S., E., St., Rd., 
Cv., etc.), (3) lowest house number, and (4) zip code. 
Zip code is used for sorting, in order to separate streets 
in different areas that have the same name and series of 
house numbers. The output file from the Select Census 
Record program is input to Program 3, the Match 
Selected Records program. 

Program 2: Select Local Records—Core Required— 
53 KB. The Select Local Records program reads the 
local file (in this case, real estate records) and selects 
the desired records as defined by zip code. It also 
attempts to format the address as it appears in the 
census file and assigns record type codes (1 for odd and 
2 for even). 

The output file is written in sorted sequence by 
record type, street name (reformatted exact name), 
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street number, and zip code. This file is also input to 
Program 3, the Match Selected Records program. If 
requested, a file of undesired records can be output to 
disk for future use. 

Program 3: Match Selected Records—Core Required— 
29 KB. The Match Selected Records program matches 
the two output files created by Programs 1 and 2 using a 
point evaluation system. The point system corresponds 
to the sort sequence, giving a higher value to the major 
sort field and a lower value to the minor sort field: 
(1) match on record type, value = 8; (2) match on street 
name, value = 4; (3) match on street number, value = 2; 
(4) match on zip code, value=1. These point values 
were purposely selected, as the sum of any combination 
of fields (two or three fields) will be a unique sum. A 
record is considered to be exactly matched when the 
total point value equals 6, 7, 14, or 15. A 13 is defined 
as a possible match, and all other combinations (sums) 
are defined as unmatched. All local records are written 
out, and those that are matches or possible matches are 
written with the census tract and block numbers. 

The Match Selected Records program always holds 
the previous census record in a “‘save” area as it advances 
the census file. If it does not find an exact match on the 
current record, it checks to see if the previous record 
provides a better match. If neither is satisfactory, it 
advances the census file and checks again, as long as 
the match fields in the census file are not greater than 
those in the local file. A printed list of those records 
having no match is also provided. 


Relative Advantages 

The major advantage of this system of programs is 
its realtive simplicity. Anyone with a minimum knowl- 
edge of COBOL can modify the programs to process 
his or her files. Furthermore, as far as modification is 
concerned, the Select Local Records program must 
be changed only with respect to record length, block 
length, location, and length of fields in order to meet 
the individual requirements of one’s own data files. 
All that is required in the Select Census Records pro- 
gram is a change in range of the zip codes that one is 
selecting. If one wishes to process the complete file, 
the “zip code compare” statement may be removed. 
The only modification to the Match Selected Records 
program is to the “select county” file, which one must 


define exactly as the output file from the Select Local 
Records program. No additional passes of unmatched 
records are required, as our system of programs selects 
the best possible match on the initial pass. 


Verification and Accuracy 

The system of programs reviewed here has been 
verified by matching a local real estate file containing 
approximately 153,000 records to the U.S. Census 
Bureau’s geographic base file (GBF-DIME) of addresses. 
A match rate of approximately 95% was obtained, and 
this rate is comparable to that of the U.S. Census Bureau’s 
(1970) ADMATCH programs. 

The programs were tested on an IBM Model 360/30 
computer having 64 KB of storage (of which 10 KB is 
used by the systems supervisor) with three 2314 disk 
drives and two 2400 tape drives. These programs have 
since been run on a Univac 1100, with the only modifi- 
cation required being the appropriate changes in JCL. 
Of course, core requirements for the present programs 
could be reduced by (1) using utility rather than COBOL 
sorts, (2) reducing the blocking factor of the files, and/ 
or (3) reserving no alternative areas for input/output. 
However, the tradeoff would be slower processing. On 
the other hand, it should be noted that larger core 
storage availability could significantly speed up process- 
ing. 


Availability 

A printed listing of these programs is available at no 
cost. Contact either Rebecca F. Guy or Louis G. Pol, 
Department of Sociology, Memphis State University, 
Memphis, Tennessee 38152. 
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NOTE 


1. The GBF-DIME file is the U.S. Bureau of the Census 
Geographic Based File for Direct Independent Map Encoding. 
This file contains street address ranges, as well as a host of geo- 
graphic identifying codes specific to the addresses within those 
ranges. 
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